Extracting Idiomatic Hungarian Verb Frames

نویسنده

  • Bálint Sass
چکیده

We describe a machine learning method for collecting idiomatic fixed stem verb frames. Firstly we collect frequent frame candidates from the output of a partial parser, secondly we apply a certain idiomaticity metric to the list to get the most idiomatic frames. The extracted frames will be translated to English and used as a resource in a Hungarian-to-English machine translation system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Method for Extracting Simple and Multiword Verbs with Valence Information and Application for Hungarian

We present a method for extracting verbcentered constructions (VCCs) from corpora. In our framework, simple and multiword verbs, with or without valence are all VCCs. They are treated uniformly, from e.g. to breathe till e.g. to take something into consideration. In order to extract VCCs we represent the corpus as a sequence of clauses that contain a verb together with all its NP dependents. Th...

متن کامل

Extracting Translations Verb Frames*

We describe a method for extracting translation verb frames (parallel subcategorization frames) from a parallel dependency treebank. The extracted frames constitute an important part of machine translation dictionary for a structural machine translation system. We evaluate our method independently, using a manually annotated test dataset, and conclude that the bottleneck of the method lies in q...

متن کامل

A Uni ed Method for Extracting Simple and Multiword Verbs with Valence Information and Application for Hungarian

We present a method for extracting verbcentered constructions (VCCs) from corpora. In our framework, simple and multiword verbs, with or without valence are all VCCs. They are treated uniformly, from e.g. to breathe till e.g. to take something into consideration. In order to extract VCCs we represent the corpus as a sequence of clauses that contain a verb together with all its NP dependents. Th...

متن کامل

Using chunked corpora for the acquisition of collocations and idiomatic expressions

This paper1 discusses the use of recursive chunking of large German corpora (over 300 million words) for the identification and partial classification of significant lexical cooccurrences of adjectives and verbs. The goal is to provide a fine-grained syntactic classification of the data at the levels of subcategorization and scrambling. We analyze the combinatory preferences of adjectives with ...

متن کامل

A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds

We present an annotation study on a representative dataset of literal and idiomatic uses of infinitive-verb compounds in German newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006